Reconfigurable system

ABSTRACT

A reconfigurable processor (VPU) is designed for a technical environment having a standard processor (CPU) which has, for example, a DSP, RISC, CISC processor or a (micro)controller. This design permits the simplest and most efficient possible connection. Another aspect is the simple programmability of the resulting system. Continued use of existing programs on the CPU and code compatibility plus simple integration of the VPU into the existing programs are taken into account.

FIELD OF THE INVENTION

[0001] The present invention relates to reconfigurable processors. Inparticular, the present invention addresses connecting a reconfigurableprocessor to a standard processor in a particularly favorable manner.

BACKGROUND INFORMATION

[0002] A reconfigurable architecture is understood in the present caseto be modules (VPUs) having a configurable function and/orinterconnection, in particular integrated modules having a plurality ofarithmetic and/or logic and/or analog and/or memory and/orinternal/external interconnecting units that are configured in one ormore dimensions and are interconnected directly or via a bus system.

[0003] Generic modules of this type include in particular systolicarrays, neural networks, multiprocessor systems, processors having aplurality of arithmetic units and/or logic cells and/orcommunicative/peripheral cells (IO), interconnecting and networkingmodules such as crossbar switches as well as conventional modules of thegeneric types FPGA, DPGA, Chameleon, XPUTER, etc. Reference is made inthis context in particular to the following protective rights of thepresent applicant: P 44 16 881.0-53, DE 197 81 412.3, DE 197 81 483.2,DE 196 54 846.2-53, DE 196 54 593.5-53, DE 197 04 044.6-53, DE 198 80129.7, DE 198 61 088.2-53, DE 199 80 312.9, PCT/DE 00/01869, DE 100 36627.9-33, DE 100 28 397.7, DE 101 10 530.4, DE 101 11 014.6, PCT/EP00/10516, EP 01 102 674.7, DE 196 51 075.9-53, DE 196 54 846.2-53, DE196 54 593.5-53, DE 197 04 728.9, DE 198 07 872.2, DE 101 39 170.6, DE199 26 538.0, DE 101 42 904.5, DE 101 10 530.4, DE 102 02 044.2, DE 10206 857.7, DE 101 35 210.7-53, EP 02 001 331.4, 60/317,876. These areherewith incorporated to the full extent for disclosure purposes.

[0004] The aforementioned architecture is used as an example forillustration and is referred to below as VPU. This architecture iscomposed of any arithmetic cells, logic cells (including memories)and/or memory cells and/or networking cells and/orcommunicative/peripheral (IO) cells (PAEs) which may be arranged to forma one-dimensional or multidimensional matrix (PAC), the matrixoptionally having different cells of any type; bus systems are alsounderstood to be cells here. The matrix as a whole or parts thereof areassigned a configuration unit (CT) which influences the interconnectionand function of the PA.

SUMMARY

[0005] A reconfigurable processor (VPU) is designed into a technicalenvironment having a standard processor (CPU) such as a DSP, RISC orCISC processor or a (micro)controller. This design permits the simplestpossible connection, which is nevertheless very efficient. Anotherfeature is the simple programming of the resulting system. Continued useof existing programs of the CPU and code compatibility and simpleintegration of the VPU into existing programs with no problem are takeninto account by the method described here.

[0006] Reconfigurable modules (VPUs) of different generic types (such asPACT XPP technology, Morphics, Morphosys, Chameleon) are generallyincompatible with existing technical environments and programmingmethods.

[0007] The programs of the modules are also incompatible withpre-existing CPU programs. This necessitates enormous developing effortin programming, e.g., in particular for modules of the generic Morphicsand Morphosys types. Chameleon already has a standard processor (ARC)integrated into the reconfigurable modules. Thus, the tools forprogramming are available. However, not all technical environments aresuitable for use of ARC processors, and in particular existing programs,code libraries, etc. are often provided for any indeterminate otherCPUs.

[0008] In accordance with the present invention, VPU (or a plurality ofVPUs, without having to mention this specifically each time) isconnected to a preferred CPU in such a way that it assumes the positionand function of a coprocessor. The function as coprocessor permits thesimple tie-in to existing program codes according to the pre-existingmethods for handling coprocessors according to the related art.

[0009] This system may be designed in particular as a (standard)processor or unit and/or integrated into a semiconductor (system onchip, SoC).

[0010] In order to provide the coprocessor link between the CPU and theVPU, an exchange of data, i.e., information between the CPU and VPU, isnecessary. In particular, the processor must typically relay data andinstructions about what must be done to the data. The data exchangebetween the CPU and VPU may take place via memory linkage and/or IOlinkage. The CPU and VPU may in principle share all the resources. Inparticular embodiments, however, it is also possible for the CPU and VPUto jointly use only some of the resources, while other resources areavailable explicitly and exclusively for the CPU or the VPU. Thequestion of which variant is preferred will typically depend on, amongother things, the overall layout of the system, the possible cost,available resources and the expected data load. It should be pointed outthat whenever reference is made to a single CPU, this may also beunderstood to refer to a plurality of CPUs together.

[0011] To perform a data exchange, data records and/or configurationsmay be copied and/or written/read in memory areas provided specificallyfor this purpose and/or corresponding basic addresses may be set so thatthey point to the particular data ranges.

[0012] In one preferred variant, for controlling the coprocessor, a datarecord containing the basic settings of a VPU such as, for example,certain basic addresses, is provided. In addition, status variables mayalso be provided in the data record for triggering and function controlof a VPU by a CPU and/or for separate transmission and may be exchangedwith or separately from data. In a particularly preferred variant, theaddresses may be flexibly distributed and allocated. Thus preferablyonly one basic address in the I/O address space or the memory addressspace need be fixedly agreed upon to be used with its data record as apointer to the flexibly defined addresses.

[0013] The data record may be exchanged via a common memory (RAM) and/ora common peripheral address base (IO). The addresses may be flexiblydistributed and allocated.

[0014] For synchronization of the CPU and VPU, unidirectional or mutualinterrupt methods (e.g., interrupt lines) may be provided and/orsynchronization may be performed via polling methods. In addition,interrupts may also be used for synchronizing data transfers and/or DMAtransfers. In one embodiment that is particularly preferred, a VPU isstarted by a CPU and then independently thereof it runs the applicationwhich has been started, i.e., instructed.

[0015] A preferred structure in which the VPU used provides its ownmechanisms for loading and controlling configurations is particularlyefficient. For example, PACT XPP and Chameleon belong to the generictype of these VPUs. The circuits according to the present inventionpermit a method for operation so that some or all configurations of theVPU together with the program of the CPU to be executed are loaded intoa memory. During execution of the program, the CPU may refer to thememory locations (e.g., via addresses or pointers), each containing theparticular configurations to be executed. The VPU may then automaticallyload the configurations without any further influence by the CPU. If andto the extent that the VPU, i.e., the reconfigurable field havingparticularly coarse-grained runtime-configurable elements, has a loadlogic for loading configurations, it may be sufficient if the processorissues instructions to the CPU to load a certain configuration. The callto the reconfigurable processor, which then functions as thecoprocessor, may thus preferably be issued via a single instruction tothe load logic. It should be pointed out that by prior agreement betweenthe VPU and CPU, i.e., the calling host processor, it is possible tostipulate precisely which configuration is to be executed by which call.It should be pointed out here that suitable control means may beprovided in the load logic unit, whether dedicated, implemented orformed by one or more reconfigurable cells of the reconfigurableprocessor. Execution begins immediately or, if necessary, is begun viaadditional information (e.g., interrupt and/or start instructions) bythe CPU.

[0016] In a particularly preferred further embodiment, the VPU is ableto independently read and write data within one or more memories, someof which may be shared with or independent of the CPU.

[0017] In a particularly preferred further embodiment, the VPU may alsoindependently load new configurations out of the memory and reconfigurethem as needed without requiring any additional influence by the CPU.

[0018] These embodiments permit operation of VPUs mostly independentlyof CPUs. Only synchronization exchange between the CPU and VPU, which ispreferably bidirectional, should additionally be provided to coordinatethe data processing and/or configuration execution sequences.

[0019] The sequence control of a VPU may be accomplished directly by aprogram executed on the CPU, which more or less constitutes the mainprogram which swaps out certain subprograms to the VPU. This variant isparticularly easy to implement.

[0020] However, mechanisms controlled via the operating system (inparticular the scheduler) are preferably used for synchronization andsequence control. Whenever possible, a simple scheduler in particularmay perform the following after transfer of the function to the VPU:

[0021] 1. allow the current main program to continue running on the CPUif it is able to run independently on a VPU and simultaneously with dataprocessing; additionally and/or alternatively,

[0022] 2. if or as soon as the main program must wait for the end ofdata processing on the VPU, the task scheduler switches to another task(e.g., another main program). The VPU may continue working in thebackground regardless of the CPU task currently at hand.

[0023] Each newly activated task will typically (if it uses the VPU)check before use to determine whether it is available for dataprocessing or whether it is currently still processing data in a mannerwhich blocks the required VPU resources. It is then necessary to waiteither for the end of data processing or, if preferable according topriority, for example, the task must be changed.

[0024] A simple and nevertheless efficient method may be created and/orimplemented in particular on the basis of so-called descriptor tableswhich may be implemented as follows, for example: For calling the VPU,each task generates one or more tables (VPUPROC) having a suitabledefined data format in the memory area assigned to it. This tableincludes all the control information for a VPU, e.g., theprogram/configuration to be executed (or pointers to the correspondingmemory locations) and/or memory location(s) (or pointers to each) and/ordata sources (or pointers thereto) for the input data and/or the memorylocation(s) (or pointers thereto) for the operand or the result data.

[0025] For example, a table or a chained list (LINKLIST) may be found inthe memory area of the operating system, pointing to all VPUPROC tablesin the order in which they are created and/or called.

[0026] Data processing on the VPU is preferably performed so that a mainprogram creates a VPUPROC and calls the VPU via the operating system.The operating system creates an entry in the LINKLIST. The VPU processesthe LINKLIST and executes the particular VPUPROC referenced. The end ofa particular data processing is preferably indicated by a correspondingentry in the LINKLIST and/or VPUCALL table which the VPU may query byregular polling, for example. As an alternative, interrupts may be usedas an indicator from the VPU to the CPU and if necessary may also beused for exchanging the VPU status. It is not only possible here toindicate the fact that the end of the program has been reached but it isalso possible to indicate the fact that a point in the subprogram hasalready been reached, and if so, which point.

[0027] In this method, which is preferred according to the presentinvention, the VPU works largely independently of the CPU. Inparticular, the CPU and the VPU may perform independent and differenttasks per unit of time. The operating system and/or the particular tasksneed only monitor the tables (LINKLIST and/or VPUPROC).

[0028] As an alternative, LINKLIST may also be omitted by chaining theVPUPROCs to one another using pointers, as is known from lists, forexample. VPUPROCs that have been processed are removed from the list andnew ones are inserted. Programmers are familiar with this method whichtherefore need not be explained in greater detail here.

BRIEF DESCRIPTION OF THE DRAWINGS

[0029]FIG. 1 shows an example VPU.

[0030]FIG. 2 shows an example CPU system.

[0031]FIG. 3 shows an exemplary system.

[0032]FIG. 4 shows an example interface structure.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0033]FIG. 1 shows a particularly preferred VPU design. Configurationmanagers (CTs) (0101), preferably hierarchical, control and administer asystem of reconfigurable elements (PACs) (0102). The CTs are assigned alocal memory for configurations (0103). The memory also has an interface(0104) to a global memory which supplies the configuration data. Theconfiguration sequences are controllable via an interface (0105). Thereis an interface of reconfigurable elements (0102) to sequence controland event management (0106); there is also an interface to data exchange(0107).

[0034]FIG. 2 shows a detail of an exemplary CPU system, e.g., a DSP ofthe C6000 type from Texas Instruments or a microcontroller from ARM(0201). Program memories (0202), data memories (0203), any peripherals(0204) and an EMIF (0205) are shown. A VPU is integrated (0208) as acoprocessor via a memory bus (0206) and a peripheral bus (0207). A DMAcontroller (EDMA) (0209) may perform any DMA transfers, e.g., betweenmemory (0203) and VPU (0208) or memory (0203) and peripherals (0204).The VPU and/or the CPU may also access the memory independently withoutthe assistance of a DMA. The shared memory may be also be designed as adual port memory or multiport memory in particular. Additional units, inparticular reconfigurable FPGAs, may be assigned to the system to permitfine-grained processing of individual signals or data bits and/or to beable to establish flexible adaptable interfaces (e.g., various serialinterfaces (V24, USB, etc.)), various parallel interfaces (hard driveinterfaces, Ethernet, telecommunications interfaces (a/b, TO, ISDN, DSL,etc.)).

[0035]FIG. 3 shows a more abstract system definition. A CPU (0301) isassigned a memory (0302) to which it has read and/or write access. A VPU(0303) is connected to the memory. The VPU is divided into a CT part(0309) and the reconfigurable elements for data processing (0310).

[0036] To increase memory accesses, the memory may have a plurality ofindependent access buses (multiport) which under some circumstances maybe used simultaneously. In a particularly preferred embodiment, thememory is segmented into multiple independent segments (memory banks),each bank optionally being accessed independently. All segments arepreferably within a uniform address space.

[0037] One segment is preferably available mainly for CPU (0304),another segment is available mainly for data processing by VPU (0305)and yet another segment is mainly available for the configuration dataof VPU (0306).

[0038] Typically and preferably a fully embodied VPU has its own addressgenerators and/or DMAs to perform data transfers. Alternatively and/oradditionally, it is possible for a DMA (0307) to be provided inside thesystem (FIG. 3) for data transfers with the VPU.

[0039] The system contains IO means (0308) to which the CPU and VPU mayhave access.

[0040] Both the CPU and VPU may have dedicated memory areas and IO areasto which the other does not have access.

[0041] A data record (0311) which may be in the memory area and/or inthe IO area and/or partially in one of the two, as shown graphically, isused for communication between the CPU and VPU, e.g., for exchange ofbasic parameters and control information. The data record may includethe following information, for example, and thus constitutes a basicsettings data record:

[0042] 1. Basic address(es) of the CT memory area in 0306 for localizingthe configurations,

[0043] 2. Basic address(es) of data transfers with 0305,

[0044] 3. I/O addresses of data transfers with 0308,

[0045] 4. Synchronization information, e.g., reset, stop, starting theVPU,

[0046] 5. Status information on the VPU, e.g., error or status of dataprocessing.

[0047] The CPU and VPU are synchronized by polling status data and/orstatus information and/or preferably by interrupt control (0312).

[0048] The basic setting data record may contain a LINKLIST and/orVPUCALLs or alternatively may point to the LINKLIST and/or VPUCALLs orto the first entry thereof by pointers.

[0049]FIG. 4 shows an example embodiment of the interface structure of aVPU for tying into a system like that in FIG. 3. The VPU here isassigned a memory/DMA interface and/or an IO interface for data transfer(0401). Another system interface (0402) takes over sequence control suchas the management of interrupts, starting and stopping processing,exchange of error states, etc.

[0050] The memory/DMA interface and/or an IO interface is connected to amemory bus and/or an IO bus.

[0051] The system interface is preferably connected to an IO bus, butalternatively or additionally according to 0311 it may also be connectedto a memory. Interfaces (0401, 0402) may be designed for adaptation ofdifferent working frequencies of the CPU and/or VPU and/or system andmay have a clock matching circuit; for example, the system, i.e., theCPU, may operate at 400 MHz and the VPU at 200 MHz.

[0052] The interfaces may translate the bus protocols using a protocolmatching circuit, e.g., the VPU-internal protocol may be converted to anexternal AMBA bus protocol or vice versa.

[0053] The memory/DMA interface and/or IO interface supports the memoryaccess of the CT to an external memory which is preferably direct(memory mapped). The data transfer of the CT(s) and/or PAC(s) may bebuffered, e.g., via FIFO stages. The external memory (e.g., 0308, 0203)may be addressed directly, and DMA-internal and/or external DMAtransfers may be performed.

[0054] Data processing, e.g., initializing and/or startup ofconfigurations, is controlled via the system interface. In addition,status and/or error states are exchanged. Interrupts for the control andsynchronization between the CTs and a CPU may be supported.

[0055] The system interface may convert VPU-internal protocols to beimplemented on external (standard) protocols (e.g., AMBA).

[0056] It should be pointed out that bus interfaces, RAM cells, I/Ocells and the like may be provided as parts (PAEs) of a VPU. This isalso true when these units are to be used for processor-coprocessorlinkage.

[0057] A preferred method for generating code for the system describedhere is described in the PACT20 patent application (i.e., U.S. patentapplication Ser. No. 09/967,498), the full content of which is herewithincorporated for disclosure purposes. This method includes a compilerwhich splits the program code into a CPU code and a VPU code. The splitbetween the different processors is performed by various methods. In oneparticularly preferred embodiment, the particular split codes areexpanded by adding interface routines for communication between CPU andVPU. This expansion may also be performed automatically by the compiler.

[0058] The advantage according to the present invention is thatmanagement complexity and/or interface complexity as well as programmingof the system according to the present invention are simple andinexpensive.

[0059] The following tables show examples of communications between aCPU and a VPU. The particular active function units are assigned to thecolumns (CPU, system DMA and DMA interface (EDMA), i.e., memoryinterface (memory IF), system interface (system IF, 0402), CTs and thePAC). The rows show the individual cycles in order of execution. K1references configuration 1, which is to be executed.

[0060] The first table shows a sequence using the DMA (EDMA) system fordata transfer as an example. Each row indicates a control process takingplace sequentially. The columns show the particular activity in thecorresponding module: CPU EDMA System IF CTs PAC Initiate K1 Load K1Start K1 Configure K1 Initiate Start K1 Wait for Load data data via EDMAInitiate Data Data Read data transfer processing via EDMA Read data DataSignal transfer end of Write data operation

[0061] It should be pointed out that the EDMA and VPU are automaticallysynchronized via interface 0401, i.e., DMA transfers take place onlywhen the VPU is ready for it.

[0062] A second table shows a preferred optimized sequence as anexample. The VPU itself has direct access to configuration memory(0306). In addition, the data transfers are executed by a DMA circuitwithin the VPU, which may be fixedly implemented, for example (PACT03,i.e., U.S. Pat. No. 6,513,077) and/or result from the configuration ofconfigurable parts of the PAC. CPU EDMA System IF CTs PAC Initiate K1Start K1 Read Configure configuration K1 Data Start K1 Read datatransfer Read data Data processing Data Signal Write data transfer endof Write data operation

[0063] The operating and synchronization complexity for the CPU isminimal, so that maximum performance is achieved.

[0064] In addition, according to this method a plurality ofconfigurations may be executed in different areas of the VPU, i.e., indifferent PAEs or on the same resources by using a time multiplexingmethod.

[0065] In particular, a type of double buffering may be used forparticularly simple and rapid reconfiguration in which a plurality ofVPUs are provided, some optionally being reconfigured at a point in timeof the VPUs, while others perform computations and possibly yet othersmay be inactive. The data connections, trigger connections, statusconnections, etc. are exchanged in a suitable way among the plurality ofVPUs and optionally interconnected through addressed buses and/ormultiplexers/demultiplexers according to the VPUs currently activeand/or to be reconfigured.

[0066] The full content of all the PACT patent applications identifiedabove as well as their family members is herewith incorporated fordisclosure purposes.

[0067] Other further embodiments and combinations of the presentinventions mentioned above are, of course, possible. In this regard, itshould be pointed out in particular that instead of connecting a VPU toa CPU using the VPU as the coprocessor, such a connection is alsopossible using the CPU as the coprocessor. Such a case is preferred inparticular to have instruction structures recognized as having onlyminor parallelism and/or minor vector components processed sequentiallyas program parts in compiling. It is then possible in particular for theVPU to call the CPU via linklists or tables. The linklists or tables maycontain information indicating where data is to be retrieved, at whichaddress the CPU is able to access program information to be processed byit, etc. The inquiry as to whether the CPU is then finished withprocessing the program parts to be executed by it may in turn be handledvia polling or the like. Here again, the operating system may be used toassign tasks to the CPU and/or to monitor the tasks to be executed byit. In principle, all the methods described here may thus be used forboth linking a CPU to a VPU as a coprocessor as well as the converse.The only thing that may be important here is which type of linkage theoperating system is designed for. It should be pointed out that it ispossible in particular to provide an operating system that permitsmutual linkage, i.e., in particular, optionally the CPU to the VPUand/or parts thereof and the converse. The latter is particularlyadvantageous when entire program blocks having mainly sequentialportions are to be delivered by the VPU as host to the CPU ascoprocessor and these program blocks still have strongly vectorial orparallel code in some cases which may be more or less transmitted backby the CPU, in particular in response to a current or predicted VPU loadthat has been determined.

1-14. (cancelled).
 15. A data processing system, comprising: at leastone reconfigurable processor linked to a standard processor.
 16. Thedata processing system as recited in claim 15, wherein thereconfigurable processor is arranged as a coprocessor.
 17. The dataprocessing system as recited in claim 15, further comprising: at leastof a memory and an I/O linking device to link the standard processor andthe reconfigurable processor together.
 18. The data processing system asrecited in claim 17, wherein the linking device is designed to link byat least one of: i) transmission of data, ii) status information, andiii) configurations.
 19. The data processing system as recited in claim15, wherein the standard processor and the reconfigurable processor areconfigured to jointly access at least a portion of a memory area. 20.The data processing system as recited in claim 15, further comprising:at least one of a unidirectional interrupt line and multidirectionalinterrupt line to synchronize between the standard processor and thereconfigurable processor.
 21. The data processing system as recited inclaim 15, wherein the reconfigurable processor includes at least one ofa respective configuration loading arrangement and a respectiveconfiguration checking arrangement, for at least a partial newconfiguration without CPU input during run time.
 22. The data processingsystem according to claim 15, wherein the data processing system isintegrated on a chip.
 23. A data processing method for a reconfigurableprocessor and a standard processor, comprising: calling up, by thestandard processor, at least one of subprograms and subprogram parts, tobe executed on the reconfigurable processor; and executing on thereconfigurable processor at least one of the subprograms and subprogramparts.
 24. A data processing method for a reconfigurable processor and astandard processor, comprising: calling up, by the reconfigurableprocessor, at least one of subprograms and subprogram parts, to beexecuted on the standard processor; and executing, on the standardprocessor, the at least one of the subprograms and subprogram parts. 25.The data processing method as recited in claim 23, generating controlinformation when the reconfigurable processor is called up.
 26. The dataprocessing method as recited in claim 25, wherein the controlinformation is generated as a table which includes at least one of: i) aconfiguration to be executed, ii) pointers to corresponding memorylocations, iii) data sources, iv) operand memory locations, v) resultdata memory locations, and vi) pointers thereto.
 27. The data processingmethod as recited in claim 23, further comprising: creating a link listwhen the coprocessor is called up; and processing by the reconfigurableprocessor, the list.
 28. The method as recited in claim 23, furthercomprising: exchanging information between the reconfigurable processorand the standard processor, the information including at least one of:i) a basic address of a load logic memory area for localizingconfigurations, ii) a basic address of data transfers, iii) I/Oaddresses of data transfers, iv) synchronization information pertainingto resetting, stopping or starting the reconfigurable processor, and v)status information on the reconfigurable processor, pertaining to atleast one of errors and states of data processing.
 29. The dataprocessing method as recited in claim 23, further comprising:synchronizing the standard processor and the reconfigurable process bypolling at least one of status data and information, or by an interruptcontrol.